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Abstract 

Under which conditions and with which distortions can we preserve the pairwise-distances 
of low-complexity vectors, e.g., for structured sets such as the set of sparse vectors or the 
one of low-rank matrices, when these are mapped (or embedded) in a finite set of vectors? 

This work addresses this general question through the specific use of a quantized and 
dithered random linear mapping which combines, in the following order, a sub-Gaussian 
random projection in of vectors in K^, a random translation, or dither, of the projected 
vectors and a uniform scalar quantizer of resolution J > 0 applied componentwise. 

Thanks to this quantized mapping we are first able to show that, with high probability, 
an embedding of a bounded set K, C in SZ^ can be achieved when distances in the 
quantized and in the original domains are measured with the ii- and .^ 2 -iiorm, respectively, 
and provided the number of quantized observations M is large before the square of the 
“Gaussian mean width” of /C. In this case, we show that the embedding is actually quasi¬ 
isometric and only suffers of both multiplicative and additive distortions whose magnitudes 
decrease as for general sets, and as for structured set, when M increases. 

Second, when one is only interested in characterizing the maximal distance separating two 
elements of /C mapped to the same quantized vector, i.e., the “consistency width” of the 
mapping, we show that for a similar number of measurements and with high probability this 
width decays as for general sets and as 1/M for structured ones when M increases. 

Finally, as an important aspect of our work, we also establish how the non-Gaussianity 
of sub-Gaussian random projections inserted in the quantized mapping {e.g., for Bernoulli 
random matrices) impacts the class of vectors that can be embedded or whose consistency 
width provably decays when M increases. 


1 Introduction 

There exists an ever-growing trend in high (or “big”) dimensional data processing to design new 
procedures (or to simplify existing ones) using linear dimensionality reduction (LDR) methods 
in order to get faster or memory-efficient algorithms. Provided this reduction does not bring 
too much distortion between the initial data space and the “reduced” domain, as often allowed 
by the intrinsic “low-dimensionality” properties of the input data, many techniques, such as 
nearest-neighbor search in big databases Bi, classification [5], regression [38] . filtering HZ!, 
manifold processing [7] or compressed sensing mm can be developed in this reduced domain 
with controlled loss of accuracy, as well as stability with respect to data corruption {e.g., noise). 
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catholique de Louvain (UCL), Belgium. The author is funded by Belgian National Science Foundation (F.R.S.- 
FNRS) 


1 



Most often, those LDR tools rely on defining a random projection matrix (sometimes called 
sensing matrix) with fewer rows M than colnmns N, whose multiplication with data represented 
as a set of vectors in provides a reduced representation (or sketch) of the latter. This is the 
scheme implicitly promoted for instance by the celebrated Johnson-Lindenstrauss (JL) lemma 
for finite sets of vectors S C i.e., with |5| < oo [3T]. This cornerstone result and its 
subsequent developments ECS] showed that, given a resolution e > 0, if M ^ Ce ^ log S where 
S = |5| is the cardinality of S and C > 0 is a general constant, then a random matrix $ G -^MxN 
whose entries are independently and identically distributed (i.i.d.) as a centered sub-Gaussian 
distribution with unit variance defines an isometric mapping that preserves pairwise-distances 
between points in 5 up to a multiplicative distortion e. In other words, $ defines an e-isometry 
between (5,^2) and ($5,.£2), i-e., with high probability, for all x,y G S, 

(1 - e)\\x - y\\ ^ - ^y\\ ^ (1 + e)\\x - y\\. (1) 

Equivalently, one observes that keeping the probability of success constant with respect to the 
random generation of $ and inverting the requirement linking M and e, such an isometry has a 
distortion e decaying as 1/y/M when M increases, i.e., this distortion vanishes when M/logS is 
large. Notice that variants of this embedding result exist with different “input/output” norms; 
see, e.g., |36] for a unified treatment over a family of interpolation norms including £2 and ii as 
special cases. 

The JL lemma has been later generalized to any subsets tC C , not only finite, whose 
typical “dimension” can be considered as small with respect to N (see, e.g., ECaisg]). In other 
words, as soon as 1C displays some internal structure that makes it somehow parametrisable with 
much fewer parameters than N, as for the set of sparse or compressible signals, the set of low- 
rank matrices, signal manifolds, or a set given as a union of low-dimensional subspaces, an 
e-isometry like Q can be defined for all pairs of vectors in 1C. This is for instance the essence 
of the restricted isometry property (RIP) and its link with the JL lemma, where 0 holds with 
high probability for all iL-sparse vectors provided M ^ CK\ogN/K ITT]. 

However, these embeddings have one strong limitation. Except in very specific situations, 
such as for discrete sub-Gaussian random matrices $ {e.g., Bernoulli) and finite sets 1C, the set 
C is not finite. An infinite number of bits is thus required if one needs to store, process 
or transmit ^x without information loss for any possible x G 1C. Moreover, knowing how many 
bits are required to represent snch projections is also important theoretically for assessing and 
measnring the level of information contained in the reduced data space or for improving specific 
data retrieval and processing algorithms. Additionally, if this measure of information can be 
achieved, nothing prevents us to take M ^ N, as the sought “dimensionality reduction” can 
be aimed at minimizing the number of bits rather than the dimensionality M. For instance, 
[3| defines locality-sensitive hashing (LSH) as a procedure to turn data vectors into quantized 
hashes that preserve locality, so that close vectors induce, with high probability, close hashes. 
However, this method is specifically designed for boosting nearest-neighbor searches over a finite 
set of vectors and not to define an isometry similar to Q. 

As a more practical solution, the embedding realized by a random projection $ is often 
followed by a scalar quantization procedure, e.g., with a uniform scalar quantizer Q : M —)• JZ 
with resolution J > 0, applied componentwise on the image of A direct impact of this 
sequence of operations is to induce a new additive distortion in ([^ related to 5, as discussed 
in m- Indeed, assuming $ respects Q for all x and y va. a certain subset 1C C M'^, given a 
uniform quantizer Q(-) := of resolution <5 > 0 applied componentwise on vectors of 

we would have |Q(A) — A| ^ 6/2 for all A G M, which involves ||Q('u) — it|| ^ y/M6/2 for 
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(2) 


any u G Therefore, a simple manipulation of 0 provides 

{l-e)\\x-y\\-6 ^ ^||Q($a;) - Q($y)|| ^ {1 + e)\\x - y\\ + 6. 

In other words, as described in Sec. the quantized mapping A(-) := Q($ •) defines now a 
quasi-isometric embedding between (/C C and {A{}C) C ,£ 2 )- 

However, while Q displays a constant additive distortion, several works in this context have 
observed that such an additive error actually decays as M increases. First, when distances in 
the reduced space are measured with the t'l-norm and when Q is combined with a ditherinfj^ 
a quasi-isometry similar to Q holds with high probability for all vectors in a finite set K, = S 
m . The additive distortion reads then c5e for some absolute constant c > 0 and this error 
also decays as l/y/M, as does the multiplicative error e. Second, when combined with universal 
quantization [TO], i.e., with a periodic scalar quantizer Q, an exponential decay of this distortion 
as M grows can be reached; for the moment, this has been proved only for sparse signal sets. 
Finally, recent works related to 1-bit compressed sensing (CS) have shown that for a quantization 
Q reduced to a sign operator (ie., Q($ •) = sign ($ •)) the angular distance between any pair of 
vectors of a low-dimensionality set /C is close to the Hamming distance of their mappings up to 
an additive error decaying as 1/M^/'^ for some q 2. This is true for random Gaussian matrices 
and for the set of sparse signals [291 01] , for any sets with “low dimensionality” as measured 
by their Gaussian mean width |44[ 06] (see below) and even for sub-Gaussian random matrices 
provided the projected vectors are not “too sparse” [2], i.e., for vectors whose .^ 00 -norm is much 
smaller than their £ 2 -norm. 

Contributions: Gonsidering these last observations, the main results of this paper show that; 

(i) quasi-isometric embeddings can be obtained with high probability from scalar (dithered) 
quantization after linear random projection; for such embeddings both multiplicative and 
additive distortions co-exist when, as in m, distances between mapped vectors are mea¬ 
sured with the I'l-norirQ 

(a) random sensing matrices for such embeddings are allowed to be generated from symmetric 
sub-Gaussian distributions provided embedded vector differences are not “too sparse” (as 
in the 1-bit case PI); 

(Hi) the results above actually hold with high probability for any subset /C of as soon as 
M is large compared to its typical dimension, i.e., to its squared Gaussian mean width. 

(iv) with high probability, the biggest distance separating two consistent vectors in 1C {i.e., 
characterized by identical quantized mappings), that is what we call the consistency width, 
decays when M increases at a faster rate than what could be predicted by using just the 
implications of a quasi-isometry. This extends to any set JC the works of [281 07], that 
were valid only for sparse signals; 

(v) for particular structured sets, e.g., the set of (bounded) sparse vectors or the set of 
(bounded) low-rank matrices, the minimal values of M necessary to specify a quantized 
embedding or a small consistency width can be strongly reduced compared to those re¬ 
quired for a general set; 

^That is, when the quantizer input is randomly shifted inside the quantization bin by a random translation 
adjusted to the quantizer resolution [24] (see Sec. and Eq. @). 

^Notice that for binary embeddings the Hamming distance separating the binary mapping of two vectors, as 
used in |25l I44| . is also the half of their £i-distance. 
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Moreover, we aim at optimizing whenever it is possible the requirements on M {e.g., with 
respect to e and 5) that guarantee those results. 

Methodology: As an important aspect of our developments, we study the conditions for 
obtaining quasi-isometric embeddings of any bounded subsets /C C into Following 

key procedures established in other works |44LI45| . the typical dimension of these sets is measured 
by the Gaussian mean width, i.e., 


w{IC) := E sup \g'^u\, 
u&K 

with g ~ {0, 1). This quantity, also known as Gaussian complexity, has been recognized 

as central for instance in characterizing random processes shrinkage estimators in signal 
denoising and high-dimensional statistics |12j . linear inverse problem solving with convex opti¬ 
mization m or classihcation efficiency for randomly projected signal sets [^. More specifically, 
the minimal number of measurements M necessary to induce, with high probability, an i 2 l^ 2 - 
isometric embedding of any subset K, C into from sub-Gaussian random projections is 
known to be proportional to w{JC)‘^ |39]. Therefore, since w{)C)‘^ < log |/C| for some finite set JC, 
we recover the condition dehning the Johnson-Lindenstrauss lemma by imposing M > log|/C| 
m, while for the set of bounded AT-sparse vectors in an orthonormal basis (ONB) T' G , 

^ K\ogN/K, which characterizes the conditions of the restricted isometry property 
(RIP) for sub-Gaussian random matrices [6]. The interested reader can find a summary of the 
main properties of the Gaussian mean width in Table with explicit references to their origin. 
This table could be helpful also to keep trace of these properties while reading our proofs. 

In our developments, we sometimes complete the characterization of sets provided by the 
Gaussian mean width with another important measure: the Kolmogorov e-entropy of a set 
/C C that we denote [35]. This is defined as the logarithm of the size of the smallest 

e-net of /C, i.e., a set Ce{JC) C JC such that any vector of 1C cannot be farther than e from its 
closest vector in Ce{IC). By the Sudakov inequality, this entropy is connected to the Gaussian 
mean width as T-L{lC,e) ^ w{iC)‘^/e‘^. 

However, in specific cases this last inequality is too loose with respect to e. As summarized 
in |l2|, this is the case of the structured sets 1C dehned hereafter, for which this work will provide 
separated and tighter results. 

Definition 1 (Structured set^ Il2])- ^ bounded set 1C C with diameter d = ||/C|| := 
max{||n|| : u G 1C} < oo is structured iff there exists a quantity w{lC), independent of d, for 
which we have both 

niK^e) ^ w{lC)^log{l + ^), (3a) 

w{d-^ lC,df = w{{d-^lC - d-i/C) n eB”)^ ^ w{lCf, (3b) 

for any e > 0, where JCf./ := {1C — 1C) Cl e'W^ is the local set of 1C of radius e' > 0. 

For instance, if 1C' is a subspace of , a union of subspaces (such as the set of AT-sparse 
signals in an orthonormal basis or in a redundant dictionary of M'^), the set of rank-r matrices 
Air in Qj. even the set of group-sparse signals, then 1C' is a cone, i.e., A/C' C 1C' for any 

A > 0, and the set /C := /C' n dB^ is structured for any diameter d > 0 [42] . 

^Notice that in [42] K, is assumed to be a subset of the sphere so that 4=1. However, this slight 

difference does not change the bound on the Kolmogorov entropy or the Gaussian mean width of the structured 
sets considered in [42] and in this paper. 
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Names 

Properties 

(PI) Definition 

w{A) =Esup,^g_,^|{g,a:)| for g ~ A/"" (0,1). 

(P2) Homogeneity (Ts] Sec. 3.2] 

w{XA) = Aw(A) for A > 0. 

(P3) Set inclusion [131 Sec. 3.2] 

if A C S, w(A) ^ w(B). 

(P4) Set difference [4il Sec. 5.3] 

w{A — A) ^ 2w{A). 

(P5) Modularity (TS] Sec. 3.2] 

w{AuB) +w{AnB) = w{A) +w{B), if A, B and AuB are convex. 

(P6) Convex hnll [131 Sec. 3.2] 

u)(conv(A)) = w(A). 

(P7) Subspace ITSl Sec. 3.2] 

if Ak is a Jf-dimensional subspace of R^, then 


w{Ak n = w(Ak n B^) Vk. 

(P8) Subspace addition [isl Eq. (15)] 

w({Ak © H) n ^ if + w(H n 

(P9) Link with diameter* 

for IIAll := sup„g^ ||u||. 


aP\\A\\^wiA)^VN\\A\\. 

(PIO) Symmetrization* 

w{A) - inf„g^ ||n|| ^ Esnp^g^_^ |{g,a:)| ^ 2w(A). 

(Pll) Translation* 

ui(A) — ||t|! ^ w(A + {t}) ^ ui(A) + Pll, for t G R^. 

(PI 2) Invariance under On 

For all B £On ~{C £ : CC^ = C^C = Iw}, 

l45l Prop. 2.1] 

w{B A) = w(A). 

(P13) Translation on origin 

w{A) — Ipoll ^ w{A— {a;o}) ^ 2'u;(A) 

(from (F|^ & (F[TT||) 

for a;o G A with ||a;o|| ^ inf„g^ |jn||. 

(PI 4 ) Sudakov inequality [321 Sec. 1.7] 

For an e-net C A, log \Qe\ < w{A)‘^. 


Special sets 

Widths 

(P15) Finite [311 Sec. 1.4] 

(P16) Sphere and ball [331 Sec. 1.4] 
(P17) Sparse signals [331 Sec. 1.3] 

(PI 8) “Compressible signals” 

[31 Sec. 1.3] 

(PI 9) Low-rank matrices 

1321 Lemma 21] 

w{SY < log|5|. 

s: PiV and u;(B^) ^ ViV. 

For Sk := {u : Iloilo sj VK}, w{T,k n B^)^ < K log(2iV/if). 

For ICn,k ~ {u : u 1 ^ \/k, u ^ 1}, 
w{ICn,kY < K\og{2N/K). 

For Mr := {U G : rank(I7) s: r}, 

w{Mrf <r{Ni + N2). 


Table 1: Useful properties of the Gaussian mean width. If not otherwise noted, all sets are subsets of *: 

is obtained by a simple use of the Jensen and Cauchy-Schwartz inequalities, (F|ll[) is a simple consequence 
of the triangular inequality and of E|{g,t)| = PI!- 


Indeed, focusing first on (3b), if K,' is one of the sets listed above, K," := K,' — K,' \s also a 


cone and/C" D d~^{IC — IC). Therefore w{d~^K,edY ^ w{K ,"= e^rc(/C"n]B'^)^ 
quantity is easily bounded since K." often shares the same structure than fC', e.g., 
if 1C' = and in fact w{lC" nB^) ~ r(;(/C/||/C||) showing that w{lC) can be set to r(;(/C/||/C||) 
in (3b). 

Second, for (3a), the Komogorov entropy of such a set K,' can often be tightly bounded by 
decomposing it into a union of subspaces or subdomains restricted to dB'^, so that a global e-net 
of small cardinality could be reached by the union of the e-nets of all of these subparts [6l HU |l3] , 
he., justifying the bound T-LilC' ^ u)(/C)^log(l -|- ^). Actually, concerning (3a), it occurs 

that for all the structured sets listed above we have that either w{lC)‘^ ~ r(;(/C/||/C||)^ or both 
w{lCY and t(;(/C/||/C||)^ have the same simplified closed-form upper bound, e.g., they are both 
upper bounded by K\og{N/K) when 1C' = 

Thus, due to the observations made above, we will consider that u)(/C) can be bounded 
similarly to the actual Gaussian mean width w{\\lC\\~^ 1C) of the normalized set ||/C||“^/C, i.e., 
with the same simplified upper bound. An example of this fact for the set of bounded AT-sparse 
vectors is provided at the end of Sec. 


This last 

^ — ^2K 


Paper organization: The rest of the paper is structured as follows. In Sec. we define the 
construction of our quantized sub-Gaussian random mapping. Additionally, this section char¬ 
acterizes the sub-Gaussianity of its linear ingredient, i.e., its random projection matrix, and its 
interplay with the “anti-sparse” nature of the mapped vectors. We also formalize and motivate 
the main objectives of the paper, e.g., explaining the shape and the origins of the targeted 
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quasi-isometric embedding with its two specific distortions. Sec. provides the main results 
of this work, namely, (i) the possibility to create with high probability a quasi-isometric sub- 
Gaussian embedding from our quantized mapping (Prop.Q, and (ii) a study of this mapping’s 
consistency width behavior (Prop.[^. Sec. [^discusses those two propositions, analyzing them in 
a few specific settings in comparison with related works in the fields of dimensionality reduction 
and 1-bit compressed sensing. Sec. questions the necessity of dithering in the mapping A 
and shows that, from an appropriate counterexample, our results do not hold in full generality 
without such a dither. Finally, Sec. and Sec. contain the proofs of Prop. and Prop. 
respectively, the auxiliary Lemmas being demonstrated in appendix. 


Conventions: We find useful to summarize here our mathematical notations. Domain dimen¬ 
sions are denoted by capital roman letters, e.g., M, N, ... Vectors and matrices are associated 
to bold symbols, e.g., $ G or n G M^, while lowercase light letters are associated to 

scalar values. The identity matrix in reads while I[A] G {0,1} is the indicator function 
of a set A C M^. An “event” is a set whose definition depends on the realization of some 
random variables, e.g., if V G M is a random variable, the event A = {X ^ 0} has probability 
P(V ^ 0) = EI[j 4]. The component of a vector (or of a vector function) u reads either Ui 
or {u)i, and the vector Ui may refer to the element of a set of vectors. The set of indices 
in is [D] = {1, • • • ,D}. The cardinality of a finite set reads \J'\. For any p ^ 1, the 
^p-norm of u is ||'u||p = with || • || := || • || 2 - The “.^Q-norm” of a vector u G is 

||u||o = |suppti|, with suppii = {i : Ui ^ 0} the support of u. The {N — l)-sphere in 
is = {a; G : ||a:|| = 1} while the unit ball is denoted = {s G : ||a:|| ^ 1}. 

The diameter of a bounded set A C is written ||^|| = sup{||u|| : u G ^}. The set of 
iL-sparse signals in is defined as ■= {u £ ^ K} while the set of AT-sparse 

signals in an orthonormal basis (ONB) 'J' G i.e., with = 1^, reads 

The positive thresholding function is defined by (A)+ := 2 (A -|- |A|) for any A G M. 
For t € R, [tj (resp. \t\) is the largest (smallest) integer smaller (greater) than t. A random 
matrix $ ~ is a M x N matrix with entries distributed as j V{Q) given the 

distribution parameters 0 of "P {e.g., AA^^^(0,1) or {[Q, 1])). A random vector in R^ 

following P(0) is defined by u ~ V^{Q). Given two random variables X and Y, the notation 
V ~ y means that X and Y have the same distribution. Since our developments do not focus 
on sharp bounds, we denote by C, c, c' or c" (possibly large) constants whose value can change 
between lines. In a few places, for simplicity, we write / ^ if there exists a constant c > 0 
such that f ^ eg, and correspondingly for f ^ g. Moreover, f — g means that f ^ g and 
g ^ f. Finally, for asymptotic relations, we use the common Landau family of notations, i.e., 
the symbols O, D and 0 [M] . 

2 Quantized Sub-Gaussian Random Mapping 

In this work, given a quantization resolution d > 0, we focus on the interaction between a random 
projection of R^ into R^ and the following uniform (dithered) quantizeij^ Q(^) = ^ 

applied componentwise on vectors in R^. In other words, for some random matrix $ G R^xN 
whose distribution is specified below, we study the properties of the mapping A : R^ —?■ 
with 

A{x) := Q{^x + ^), (4) 

"^Hereafter, our developments could be adapted to any quantizer defined as Q'{t) ~ J -|- ro) G SZ, for 

some qo G [0, 5) and ro G [0,1), e.g., for the quantizer mentioned in the Introduction with ro = 0 and qo = 512. 
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where ^ G U^{[^,5]) is a uniform dithering that stabilizes the action of Q 1^ [27| . 

We specialize the mapping Q on projection (or sensing) matrices $ with entries indepen¬ 
dently and identically drawn from a symmetric sub-Gaussian distribution. We recall that a 
random variable (r.v.) X is sub-Gaussian if its sub-Gaussian norm (or '02-norm) [52] 

||X||^, := sup p-y^{E\X\P)y-. (5) 

is finit^ Examples of sub-Gaussian r.v.’s are Gaussian, Bernoulli, uniform or bounded r.v.’s, 
as 

\\X\\^, ^ ||X||oo:=inf{t0 O:P(|X|^t) = l}. 

Sub-Gaussian r.v.’s are endowed with several interesting properties described, e.g., in [52]. Their 
tail is for instance bounded as the one of a Gaussian r.v., i.e., there exists a c > 0 such that for 
all e 0 0 and for a sub-Gaussian r.v. X, 

1P(I^I > e) ^ (g) 

Moreover, since \\X -EXW^^ ^ ||-^||'02 + = 11^11^2 + ^ Il-’^llb2 +^ 2||X||^2, 

centering X has no effect on its sub-Gaussianity. 

By a slight abuse of notation, we denote collectively the distributions of symmetric sub- 
Gaussian r.v. with zero expectation, unit variance and finite sub-Gaussian norm a. by A4g,a(0,1), 
with a'^ Xj y/2 from ©• This means that if X ~ A4g,a(0,1), we do not fully specify the pdf of 
X but we know that X is centered, has unit variance and sub-Gaussian norm a. 

In this context, for a sub-Gaussian random matrix $ = {ipi, ••• 
each row is also isotropic, i.e., for all i G [M] and all u G , 

= \\uf- 

However, conversely to the Gaussian case where E|(gi,'u)| = (0)^/^||'u|| for g 1) and u G 

(since {g, u) ~ AA(0, ll'Up)), we do not necessarily have E|(</j, u)\ = c||'u|| for p A4g,a(0,l) 
and some absolute constant c > 0. 

As will be clear below, we must anyway determine the deviations to this last equality. 
Interestingly, as noted in [2], any sub-Gaussian random vector p rsj A/’s^q(0 , 1) satisfies 

r+oo 

/ \F{\{p,u)\ t) - F{\{g,u)\ t)\dt ^ K,g||ii||oo, Vu G M^, (7) 

Jo 

for some constant 0 0 depending only the distribution of p rsj A/s^q( 0, 1). While we have 
obviously = 0 if ~ 1), it is possible to bound this constant in full generality. Indeed, 

up to a simple change of variable t —?• t||'u|| in the integral, Q is sustained by the Berry-Esseen 
central limit theorem (as described in a simplified form in [21 Theorem 4.2]). This result shows 
basically that, for u G the LHS of ([^ is bounded by 9E|(/?p lliijls ^ 9\/^ Q!^||u||oo for 

Pi P ~ A4g,o(0,1). This means that ^ o? for any p A/s^q,( 0, 1). Notice, 

however, that this bound can be loose for many sub-Gaussian distributions. 

Thanks to assumption Q, we can establish the behavior of the first absolute moment func¬ 
tion 

//sg(u) := E|(¥J,'u)|. (8) 

^Notice that other equivalent definitions for sub-Gaussian r.v. exist, see e.g., [39]. 
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Since E|X| = Jq°^P(|X| ^ t) dt for any r.v. X and using Jensen’s inequality, we indeed ob¬ 
serve that 

Hsg{u) ^ (E|((^,'u)p)V2 = ii^ll^ (9) 

|//sg('U) - (|)/=||U||| ^ Kag||w||oO, (10) 

for all u G The last property, which is also considered in 1-bit CS with non-Gaussian 

projections [2], is key for characterizing quantized embeddings from sub-Gaussian projections. 


Having now fully described the elements composing our random quantized mapping A, we 
formally address the objectives defined in the Introduction by observing “when”, i.e., under 
which conditions with respect to M, there exist two small distortions A 0 ,Ag, ^ 0 such that 
the pseudo-distance V{x,y) := ||^(®) — -^{y)\\i is involved in the quasi-isometric relation 


'D{x,y) - {^f/^\\x-y\\\ ^ A^\\x-y\\ + A®, 


( 11 ) 


for all pair of vectors taken in a general subset K, C 


tN 


In particular, we aim to control the distortions A 0 and A^ with respect to M, N, the non- 
Gaussian nature of $ {i.e., through a and Kag), the typical dimension of /C (i.e., its Gaussian 
mean width) and possible additional requirements on x and y. 


Let us justify and comment the specific form taken by ©• First, V is associated to a l\- 
distance in the image of A. As detailed in Sec.[^ this choice establishes an equivalence between 
the evaluation of T> and a specific counting procedure, i.e., a count of the number of quantization 
thresholds separating each components of the randomly-projected vectors. However, it is not 
clear if our developments can be extended to a t' 2 -based pseudo-distance, even if this holds, with 
additional distortion, in the case of Gaussian random projections and for finite sets 1C [27] (see 
Sec.||). 


Second, as explained in the Introduction, a special case where both non-zero A 0 and A^, 
appear specifies the constant {4-f^^ in (11). When $ ~ 1), [27] has proved a quantized 

version of the Johnson Lindenstrauss (JL) Lemma showing that for a finite set S C of size 
S, provided M > e~^ log S, one has 


\V{x,y) - {Pj/-^ \\x - y\ 


< 

rsj 


\x - y\\ + e5, 


for all pairs x,y ^ S with a probability at least 1 — . As a direct impact of the loss of 

information induced by the quantization, we also observe here that A realizes a quasi-isometric 
mapping between {S C M'^,^ 2 ) and (A(5) C ,£i) with = e and A 0 = 6e. 

Finally, as will be clearly established in Sec. |3.1[ the anti-sparse nature oi x — y must be 


involved in the characterization of the right-hand side of ( 11 ) in the case of a general sub- 


Gaussian matrix $. Indeed, let us consider a matrix with i.i.d. Bernoulli distributed random 
entries, i.e., ^ij ~iid H(|) with E(<I>ij = 1 ) = E(<hjj = — 1 ) = 1/2 for all 1 ^ i ^ M and 


1 ^ j ^ N, the vectors a; = (1,0, • • • , 0) G and y = 0 € and assume x,y € 1C, e.g., 
with JC = 'Ek n and K ^ 1. Then, taking J = 1, we clearly have A{x) G {±1}^ and 
A{y) = 0, so that V{x,y) = 1 and \\x — y\\ = 1. Consequently, if ( [IT| ) is expected to hold on 
any pair of vectors in 1C, inserting x and y inside it gives A 0 -|- A^ ^ 1 - (#)V^ > 0.202. This 
limits our hope to have A 0 -|- A^, as small as we want by, e.g., increasing M. 

In fact, between the two distortions, it is actually A^, that should depend on the conhgura- 
tion of X — y. As proved in App. 


E|[x-h^J -[y + CW = \x-y\, Vx,yGM, C~^([0,1]). 


( 12 ) 
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Therefore, by definition of Q, from the independence of each component of A and using the law 
of total expectation over ^ and $ we have 


EV{x,y) + 0 - Q{(p'y + 0\ = {x - y)\ = ysg{x-y), 


.T 


(13) 


with (f A/s^_a,(0,1) and ^ Z//([0, (5]). From the assumption dlO| and given Kq G M, we then 

observe that 

\E'D{x,y) - - y\\\ = \ ysg{x - y) - \\x - y\\\ ^ 

for all vectors x and y such that x — y belongs to the sel|^ 

:iFo||w||L ^ 


VKo 


\x - y||, 


(14) 


(15) 


This last set amounts to considering vectors that are not “too sparse”, he., u G Ekq then 
||it||o ^ Kq^ which determines our notation Ekq as opposed to Ek- However, the converse is not 
true and Ekq / ^L^oJ ■ Since belonging to Ekq prevents sparsity, we say that a vector u G Ekq 
is an anti-sparse vector of level Kq ^ 0. 


Actually (14) states that, for vectors x — y £ Ekq, the expectation of 'D{x,y) is close to the 
one obtained with Gaussian random projections, i.e., close to the expectation (^)/^ ||a; — y\\ 
associated to = 0. Thus, if we expect to show that, for all vectors x and y in /C, V{x,y) 
concentrates around (^)/^ ||a; — y||, we must take into account the anti-sparse nature of the 
difference x — y, i.e., we would need enforcing this vector to belong to for a sufficiently 
large Kq. 


Combining these three observations, and anticipating over the next section, we can now 
refine the meaning of We are actually going to show that, if M is bigger than some Mq 

growing with the typical dimension of K, and decreasing with e (see Sec. [^, then, with high 
probability, 

- e-^^)\\x - y\\-ce5 ^ V{x,y) ^ + e + \\x - y\\ + ced, 

for all x,y £ 1C and x — y £ Ej^o- 

Remark: As will be cleared later, our developments benefit of the tools and techniques developed 
in |44j where it is shown that, for a 1-bit mapping A' : —>■ {±1}^ such that A' [x) = 

sign($a;) with a random Gaussian matrix $ ~ AA^^'^(0,1), and for the normalized Hamming 
distance V'{x,y) = M~^ J2-I[A'^{x) / A^{y)], one has, provided M > e~‘^w{IC)^ and with 
probability exceeding 1 — e~'^ that for all a;, y G /C, 

|T>'(a;,y)-arccos(|||jJij|||)| < e. 

Our extension to non-Gaussian sensing matrices is also inspired by similar developments realized 
in [2] for binary mappings and other generalized linear models. 


3 Main Results 

3.1 Quasi-Isometric Quantized Embedding 

In regards to the context explained in the previous section, our first main result can be stated 
as follows. 

®That could be pronounced “amgis”. 


9 







Proposition 1 (Quantized sub-Gaussian quasi-isometric embedding). Given <5 > 0, e G (0,1), 


Kq > 0, a hounded subset tC C and a sub-Gaussian distribution A4g,a respecting (10) for 
0 ^ Kag < oo, there exist some values c, c' > 0, only depending on a, such that, if 


M 


> 

r\j 


5‘^e^ 


wilCf 


(16) 


for a general set fC, or 

M > ^^(/C)2log(l +]&), (17) 

for structured sets JC (see Def. [^/or the definition of w), such as the set of hounded K-sparse 
signals or the one of bounded rank-r matrices, then, for $ ~ 1)? ® dithering ^ ~ 

U^{[0,S]) and the associated quantized mapping u G —)• A{u) = Q{^u -\- ^), we have with 
probability at least 1 — e~^ ^ and for all pairs x,y £ K, with x — y £ 

m/^-e-^J\\x-y\\-ce5 ^ V{x,y) ^ ((|)V= + e + ^) ||a; - y || + ceJ. (18) 


In the Gaussian case, i.e., for $ ~ AA'^^^(0,1), the conditions remain the same and (18) is 
simplified with K^g = 0 , i.e., there is no additional requirement on the anti-sparse nature ofx — y 
in (18) since Kq can he set to 1 and Uko = ■ 


In Prop. as shown in Sec. the constant part Kag/V^o of the multiplicative distortion 
appearing in both sides of ([T^ is unavoidable in the case of non-Gaussian projections (with 
Kgg 7 ^ 0). Actually, we can show that this distortion cannot decay faster than kKl/Ko) for 
non-Gaussian (but sub-Gaussian) random matrices when the level of anti-sparsity Kq oi x — y 
increases. To see this, it is sufficient to study V{x,y) for an asymptotically large M, i.e., 
WD[x, y) by the law of large numbers, and to observe how the relative error between WD{x, y) 
and (#)'/^|| X — y\\ behaves when that level Kq increases. 


Taking d = 1 by simplicity, notice first that, from the observation made in (12), 


EV{x,y) = E<^|<^’^(a; - y)\ = y.sg{x - y), 


where ~ ;B(^)^ and /Ugg was introduced in ([^. 

Let us then take x and y such that the vector w := x — y is equal to 1 on its first Kq 
components and zero elsewhere, i.e., w £ Hkq- In this case and if $ is a random Bernoulli 
matrix, y,sg{w) is actually twice the mean absolute deviation (MAD) of a Binomial distribution 
Bin(iLo, 5 ) with Kq degrees of freedom and success probability p = 1/2 since 


/i,g(m) = = 2E|(Efii Xj) - IKq\ = 2E\Pko - E/3koI, 

with, for 1 ^ j ^ Kq and Xj := ^{ipj -£ 1) ~iid ‘B({0,1}, 1/2) a Bernoulli random variable such 
that E(Aij = 0 ) = 1 / 2 , and fixo ~ Bin(A’o, 5 )- 

However, from ISlEniES] we can show that (see App. for details) 

\E\Pko - ^(3ko\- (#/^#^| ^ C^K^\ 


for C = 1/7. Gonsequently, for our choice of w = x — y such that ||m 


VKq, this shows that 


\ET>{x,y) - {ff/^\\x - y\\\ ^ 2C\\x - y\\ K^^, 


and proves that, even if we reached an asymptotic regime in M, a multiplicative distortion 
between V{x,y) and {X)/^\\x — y\\ would remain, and this one could decay faster than 1/Kq 
when Kq increases. It is therefore unclear if our decay in 1/^/Kq is optimal. 
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To conclude this section, let us observe that Prop. improves a proof of existence of a 
quantized embedding given in [44t Theorem 1.10] where it was showed that, provided M > 
e~^‘^w{)C — /C)^, there exists an arrangement of M affine hyperplanes in and a scaling factor 
A such that 

\X'Dc{x,y) - \\x - y\\\ ^ e, 

where 2?c denotes the fraction of affine hyperplanes that separate the two vectors x and y. 

For reasons explained in Sec.[^ each element 5~^\Ai{x) — Ai{y)\i appearing in 6~^V{x, y) = 
yi? actually counts the number of parallel affine hyperplanes in normal 

to and far apart by <5, with a dithering that randomly displaces the origin. Therefore, Prop.[^ 
basically constructs, in a random fashion, an arrangement of M such parallel hyperplane bundle, 
z.e., in M different directions {y3j/||(^J|, i G [Tf]}. Considering a Gaussian matrix $ (with 
Kag =0), we have therefore proved that there with a minimal M that grows like rather than 
when e decays (as expressed in p^). This is even reduced to for pairs of vectors taken 
in a structured set. 


3.2 Consistency Width Decay 


As a second important result, we optimize the decay law (as M increases) of the distance of any 
pair of vectors x,y € fC whose difference is “not too sparse” when those are mapped by A on 
the same quantization point in (5Z'^, i.e., when they are consistent. We refer to this distance 
as the consistency width of A. 

This width could be characterized from Prop.[^when P(£c, y) = 0, which provides ||a: —y|| < 
e ~ (or if /C is a structured set) for large M respecting (16) (resp. ([ItI)), 6 fixed 

and Kb^/'/Kq small. However, focusing on the conditions guaranteeing the consistency of x 
and y, and considering all quantities fixed but M, our result below reaches the improved decay 
e = for a general set K, and e = 0(1/M) for a structured one. We prove the following 

proposition in Sec. 

Proposition 2 (Consistency width upper bound). Let us take a quantization resolution <5 > 0, 
an accuracy e G (0,1), a sub-Gaussian distribution A4g,a(0,1) respecting (10) for 0 ^ At^g < oo, 
Kq > 0 such that \/1Lq ^ 16Ks, 


depending only on a, provided 


and a bounded subset JC C 
(2+&f 


iN 


of 


pN 


M 


> 




w{icy 


For a value c > 0 


(19) 


for a general set JC, or 


M > ^u)(/C)2log(l + M^), 


for a structured set JC, the map A defined in with $ rsj A4^^^(0,l) and^ 
such that, with probability exceeding 1 — 2exp(—ceM/(l + 5)), 




( 20 ) 
,5]) is 


A{x) = A{y) 


\x - y\\ ^ e, 


( 21 ) 


for all x,y ^ JC with x — y £ ^Kq- Io, the Gaussian case, i.e., for $ ^MxiV(o,i)^ t/ie 
conditions above remain the same with = 0, i.e., with no additional requirement on the 


anti-sparse nature of x — y in (21) 


Unfortunately, we were unable to produce a convincing counter example of a pair of vectors 
both with difference not in 'F.Kq and failing to meet (21) under the conditions of Prop. 
Therefore, it is not clear if the condition x — y € is an artifact of the proof or if removing 


it could worsen then dependence in e in (19). 
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4 Discussions and Perspectives 


Before delving into the proofs of Prop. and Prop. (see Sec. and Sec. respectively), let 
us discuss their meaning and limitations, providing also some perspectives for future works. 


On the impact of the diameter of structured sets: For the structured sets considered 
in the Introduction, it is known that if the linear embedding 0 holds with high probability 
for all a;, y G /C C with some distortion e > 0, then, since Q is homogeneous, a simple 

rescaling argument proves that the same relation actually holds for all points in K,' = 
or equivalently for all points in the cone /C' if /C = /C' U I6l[39]. In particular, since such 

a linear embedding occurs with high probability for sub-Gaussian random matrices provided 
M>e-2 w{IC)‘^ [3^, this requirement remains unchanged for reaching the embedding of vectors 
in K,'. 


Obviously, in the case of a quantized embedding such as ( |18[ ), the non-linear nature of 
Q prevents this rescaling argument from holding. However, an interesting phenomenon occurs 
anyway in this case through the requirements pT] ) and ( [^ of Prop. and Prop. § respectively. 
Indeed, we see there that the diameter of the set /C has only a logarithmic impact on the minimal 
value of M needed for these propositions to hold, since w does not depend on the diameter of 
/C (see Def. and the subsequent explanations). This really slow increase approaches the scale- 
invariant requirement obtained by linear embedding of structured sets, and is anyway strikingly 
slower than the quadratic amplification of the minimal number of measurements provided by 
(16) and (19) in the case of a general set /C, as involved by (10 when JC is expanded like 
/C ^ A/C for A > 1. 


Mitigating the anti-sparsity requirement: For both propositions, we can be concerned by 
the restriction that the vector difference must be “not too sparse”, i.e., for x,y £ 1C there must 
be a sufficiently big Kq, either for having x — y £ 2xo minimizing the distortion 
in (18), or for satisfying '/Kq ^ lO^sg in Prop. However, in certain cases, it is possible to 
adapt the sensing matrix as to increase this Kq. 


Indeed, assuming without loss of generality that the vectors x — y £ K, — JC are expected to 
be “too sparse” only in = 1 when the sensing matrix is non-Gaussian [i.e., K^g 7 ^ 0), we can 
always “rotate’Q/C with an ONB 'J'o of so that elements of 1C' — JC' with JC' := ^qJC have 
a higher anti-sparse degree than those oi JC — JC, i.e., 


max{iCo : {JC' - JC') n Zkq / 0} = min 

u^K—K II '‘'^‘^iioo 

^ min = maxjiFo : (/C - /C) G 2^0 7 ^ 0} 

uGfC—IC II Iloo 


( 22 ) 


possibly trying to maximize the left hand side in the selection of 'J'o- 

Therefore, while the requirements imposed on M in Prop. and Prop. are unchanged 
between JC and JC' in Prop. (by the invariance (fHI) of w{JC) in Table[^ and since ||a:' — y'|| = 
11® — y\\ for x' = ^qx and y' = ^oy, “rotating” JC with 'J'o helps to lighten the condition 
imposed on ® — y. Moreover, this rotation is of course equivalent to directly build a sensing 
matrix to quasi-isometrically embed the set JC with the mapping A(-) := Q($^-). 

Actually, in the case where 'J' = 1 as above, a good choice for ^0 is the DCT basis, i.e., using 
the incoherence of those two bases that prevents a sparse signal to be sparse in the frequency 
domain, also taking advantage of the fast FFT-based matrix-vector multiplication offered by 
the DGT. Notice, however, that the procedure above cannot work if JC is expected to generate 

^Strictly speaking, while |det^'o| = 1, 4'o G On is a rotation only if its determinant is 1. 
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differences of vectors that are sparse in different bases, e.g., a union of incoherent bases such as 
1 and the DCT basis. In such a case, it could be hard to maximize the right-hand side of (22) 
over ’J'o- 


Interestingly, a similar procedure to the one described above has been developed recently in 
im Theorem 2.3] in the context of fast circulant binary embeddings of finite sets of vectors. The 
requirement on the anti-sparse nature of the mapped vectors is there mitigated by taking T'o as 
the product of a Hadamard transform with a diagonal matrix with random Rademacher entries, 
which can provably reduce the coherence ||’J'o'^||L/ll'“ll of too sparse u with high probability. 


Intrinsic “anti-sparse” distortion limit: We can notice that for non-Gaussian random 
measurements, the term in (18) is actually lower bounded. This is simply due to 

the relation ||n|p ^ A^||rt||^, which implies Kq ^ N whatever the properties of the vector 
u ^ 1C — 1C C M^. Consequently, 




which limits our hope to tighten the multiplicative error of quantized non-Gaussian quasi¬ 
isometric embeddings, except if one considers asymptotic regimes where N can be considered 
as being much larger than . 


Distortion regimes: As already noticed in 1211, Prop. allows us to distinguish different 
regimes of the quasi-isometric embedding. If <5 ~ 0, the quantization operator tends to the 
identity function and (18) converges to a variant of the RIP generalized to any sets 1C 

as characterized in 


and to sub-Gaussian random matrices, as characterized in j44l [4^ for general sets and in 
for sparse signal sets only. For 6 ^ 2||/C|| the embedding becomes purely quasi-isometric and, 
keeping the context defined in Prop. [^ (18) involves 


§)%\\x-y\\-c{e5+^J 




V{x,y) ^ QrpWx - y\\ + c{e6 + ^) 


(23) 


for some absolute constant c > 0. However, in this case, the quantization becomes essentially 
binary. In fact, it is exactly binary for random matrices whose entries are generated from a 
bounded symmetric sub-Gaussian distribution, i.e., from ip ~ A4g,a(0,1) with ||(^||oo ^ F for 
some F > 0. In this case, since K, is assumed bounded, for all u £ 1C, |($w)j| ^ HH/CH and 
the components of A{u) = Q{^u -|- with ^ ~ U^{[0,6]) can only take two values, e.g., 
{—1,0} if 0 G /C. Moreover, if ip is unbounded and 0 G 1C, its sub-Gaussian nature is so that 
the fraction of quantized measurements that do not belong to {—1,0} can be made arbitrarily 
close to 0 when 6 increases. In conclusion, similarly to [33], we have basically defined a one- 
bit quantized embedding that preserves the norm of the projected vectors, as opposed to the 
mapping A'(-) = sign ($ •) that loses this information |29l I46j . Notice there that the role of our 
dithering can be compared to the one of the threshold inserted in the sign quantization in |33| . 
Gonversely to that work, however, we do not provide any algorithm to reconstruct a signal from 
its quantized mapping by A. 


Towards an ^ 2|^2 quasi-isometric embedding? It is not clear if Prop. could be turned 
into a quasi-isometric embedding between (/C C and {A{1C) C 67,^,£ 2 )- As said earlier, 

for Gaussian random matrices and for finite sets /C, an approximate quasi-isometric embedding 
can be found by integrating a non-linear distortion of the ^ 2 -distance, i.e 
I®-y II is replaced by gsiWx — y 


in (18) for = 0, 
Interestingly, 


for some non-decreasing function gs : M_|_ — )• If 
|( 75 (A) — A| = 0{V6X) for A ;§> 5 and | 5 <s(A) — (\/2A/y^)^/^| = 0(A) for A < 5, so that for small 
6 or large A, ^^(A) ps A. Therefore, as soon as ||a: — y\\ 3> 6, we get approximately a £ 2/£2 quasi¬ 
isometric embedding. Knowing if this extends to any subset JC and to sub-Gaussian random 
matrices is left for a future work. 
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Reconstructing low-complexity vectors from quantized compressive observation? 

Beyond the mere analysis of the quasi-isometric properties of our quantized mapping and closer 
to the context of quantized compressed sensing, this paper does not say anything on the re¬ 
construction algorithms that could be developed for recovering a signal x from its observations 
2 ; = Q($a;). A few algorithms exist for realizing this operation, some when 5 is small compared 
to the expected dynamic of ||$a3|| HlEaEl!, others in the 1-bit CS setting n ESI nans]. How¬ 
ever, for the first category, their stability (or convergence) does not rely on a quasi-isometric 
embedding property but rather on the restricted isometry property [niEllET] or on variations 
involving other norms [251126| . In future research, it will be appealing to find a proof of the 
instance optimality of those algorithms, e.g., for the basis pursuit dequantizer (BPDQ), using 
the quasi-isometry property promoted by Prop. even if recent interesting results show that 
an optimal “non-RIP” proof can be developed for BPDQ |20] . 

Extension to fast and universal quantized embeddings? We conclude this section by 
mentioning that it would be useful to prove Prop, [^for structured random matrices, e.g., for 
random Fourier or random Hadamard ensembles [22] , as recently obtained in im for the binary 
embedding of finite sets. This would lead to a fast computation of quantized mappings, with 
potential application in nearest-neighbor search for databases of high-dimensional signals. An 
open question is also the possibility to extend this work to universally-quantized embeddings |9l 
EnillHI, i.e., taking a periodic quantizer Q in Q. This could potentially lead to quasi-isometric 
embeddings with (exponentially) decaying distortions on vectors sets with small Gaussian width 
and using sub-Gaussian random matrices. 


5 On the necessity to dither the quantization 


Gonsidering the main results of this paper, namely Prop. and Prop. we could ask ourselves 
if a quantized mapping that would not include a dithering could also verify (18) and (21) under 
equivalent conditions on M and on the anti-sparse nature oi x — y for any vectors x,y in JC. 


The answer is, however, negative in full generality, i.e., it is possible to define a quantized and 
undithered map A : Jc —)• Q{^x) for some appropriate quantizer resolution 5 and sub-Gaussian 
random matrix $ that is incompatible with the definition of a quasi-isometric embedding with 
arbitrarily small additive distortion or with an arbitrarily small consistency width. 

To see this, let us set (^ = 1, Q(A) := argmin^^/gg |A —A'| = [A + ^J (applied componentwis^ , 
and take $ to be a Bernoulli random matrix, i.e., ^ij G {il}- Given the value > 0 
associated to the distribution of we also set arbitrarily an integer Kq such that — 

(nsg/VKo) ^ 1/2. In fact, we can coinpute that a = 1 for a Bernoulli r.v., so that Kgg ^ 
9\/^ < 47 from the bound given in Sec. ^ Therefore, Kq > (160)^ certainly works. 

We then dehne two /Co-sparse vectors u,v ^ with u equal to 1 on it hrst Kq components 
and 0 elsewhere, and r; := (1 -|- sKq^)u for some fixed 0 < |s| < 1/2. Glearly, when K ^ Kq 
these two vectors belong to the structured set K, := Hk H roB^ with ro := ^^fKo. Moreover, 
from our definition of Kq, the difference vector w := u — v = sKq^ u is adjustably “anti- 
sparse” since it lies in '^Kq with ||m|||/||m||^ = Kq. Interestingly, u and v are also consistent 
with respect to A since Q{^u) = Q{^v) = Q{^u -|- sKq^^u). This is due to the nature of 
quantization {i.e., a rounding to the closest integer) and to the fact that both G and 
llsiFj/^^rilloo ^ s < 1/2. 


®It is easy, but slightly more technical, to adapt our development here to the quantizer Q{-) = defined 

in Sec. We thus prefer to select Q as a rounding operation for the sake of clarity. 
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Let us now assume, as involved by Prop, that for e := it is possible to find M 

arbitrarily large before log(l + y) so that, with high probability and for all x,y G fC 

with X — y G [Lko j 

with the constant c > 0 defined in ( |18[ ). 

However, by taking the consistent vectors x = u and y = v, this inequality leads by 
construction to 

0 = mWMx) - A{y)\\i ^ -e-^)\\x-y\\-ce^ {l-e)\\x-y\\-ce. 

In other words, since \\x — y|| = s/^/Kq ^ s 

, > 1 \\^-v\\ > _^= Of 

^ 2c+\\x-y\\ ^ 2(c+s)\/Xo ’ 

which is a clear contradiction. We can similarly show that the same pair of consistent vectors 
X = u and y = u is incompatible with Prop. as then the consistency width cannot be 
arbitrarily small, even for asymptotically large M. 

Remark: Interestingly, the counter-example above is easily hijacked to show that it is impossible 
for the un-dithered quantized mapping A{-) := Q($ •) to respect the following property for an 
arbitrarily small e > 0 and provided M is large enough, 

{C - e - g{Ko)) \\x - y\\ - ce ^ h{Q{^x), Q{^y)), yx,y G )C with, x - y G'Jkq, 

where C, c > 0 are some universal constants, h : x —>■ M_|_ is any positive function 

vanishing on equal inputs {e.g., a norm, a pseudo-norm or any metric) and g is any monotonically 
decreasing function with limi_>+oo g{t) = 0. However, if Q is replaced by a sign operator as 
in |29l I44j . then the known binary e-stable embedding (or BeSE) relates the angular distance 
between x and y to the Hamming distance of their mappings, i.e., two distances that are equal 
to zero in our counter-example above, which removes the contradiction. 

Remark: The question whether dithering is necessary in the special case of a quantized mapping 
with a Gaussian random matrix $ remains open. 

6 Proof of Proposition 

The architecture of this proof is inspired by the one developed in [3Jj for characterizing a 1-bit 
random mapping A' : —)• {±1}'^, u G i—?■ A'(u) = sign($u). As will be clear below, 

some of the ingredients developed there had of course to be adapted to the specificities of A 
and of our scalar quantization. Compared to [H] we have also paid attention to optimize the 
dependency of M to the desired level of distortions induced by A in (|^. 

Prop. is proved as a special case of a more general proposition based on a “softer” variant 
of T>. This new pseudo-distance is established as follows. Defining the random mapping u G 
'(u) := with its component, we observe that for any x,y G 

= jj Efii - kS,<h\iy) - M)], (24) 

with the distinct sign event S{a,b) := {signa / sign6}. In words, for each i G [M], the sum 
over k above simply counts the number of thresholds in 6Z separating d>^(») = x + and 
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Figure 1: Behavior of the distance ci*(a, b) for a, 6 G R. On the top, t ^ 0 and forbidden areas determined by 
are created when counting the number of thresholds kS separating a and b. For instance, for an additional point 
c G R as on the figure, d}\a, b) = dP{c, b) — 35 but 35 = d^{a, b) = d^{c, 6) + 5 ^ d^{c, b) as c lies in one forbidden 
area. On the bottom figure, t ^ 0 and threshold counting procedure operated by d* is relaxed. Now d*{a, b) counts 
the number of limits (in dashed) of the green areas determined by 7^*, recording only one per thresholds kS, that 
separate a and b. Ffere, for e G R as on the figure, d®(a, b) = d°(e, b) = 35 but 45 = d*(e, b) = d*(a, 6 ) + 5 ^ d*(a, b). 


^\{y) = (fjy + Ci on the real line, since — k5,^\{y) — kd)] is equal to 1 for those 

and 0 for any other thresholds. 


Notice that the decomposition (24) also justifies the observation made at the end of Sec. 3.2 


namely the existence of uniform random tessellations of . Indeed, from the definition of A, 
for each i G [M], ~ ~ nlso counts the number of parallel affine 

hyperplanes Ilj := {u G : 3/c G Z, (fj u + — k5 = 0}, all normal to and far 

apart, separating x and y G In other words, is here tessellated with multiple so-called 
“hyperplane wave partitions” {n, : i G [M]} [23l [50] with random orientations, periods and 
dithered origin. 


Based on this observation, and as a generalization of an equivalent distance given in [441 
Sec. 5] for binary mappings, we introduce for some t G M the new pseudo-distance 


by defining the set 


b) = {a > t, b ^ —t} U {a < —t, b ^ t}. 

The pseudo-distance is a non-increasing function of t, with T^{a, b) = E{a, b) and 

V'^^\x,y) ^ V{x,y) ^ V~'^^\x,y). 

The behavior of P* is best understood by introducing the one-dimensional distance 


(26) 


(i*(a, 6) ;= (5 1[J^(a —/c5, 6 —/cd)] G 5N, fora, 6 gM, (27) 

so that 

n‘{x,y) = i E.=1 #,(»)). (28) 

Fig. explains how (f{a,b) evolves for positive and negative t, observing that, for each A; G Z, 
— k6, b — k6) determines forbidden or relaxed areas around the thresholds k6 separating a 
and b and counted by d!'{a,b). Moreover, the next Lemma, proved in App. provides a first 
evaluation of the impact of the distance “softening”, by observing that, essentially, d*(a, 6) is 
not very far from both |a — b\ and d^{a, b) for s close to t. 

Lemma 1. For any a, 6 G M and t, s G M, 

\d\a,b) - d%a,b)\ ^ 4((5-F |t - s|), 

|(i*(a, 6) — |a — 5| I ^ 4((5-|-|t|). 
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(29) 

(30) 

































As announced above, we aim now at proving the next proposition whose special case t = 0 
leads to Prop. [Tj 

Proposition 3. Given (5 > 0, e G (0,1), t G M, Kq > 0, a bounded subset /C C and 


a sub-Gaussian distribution A4g,o respecting (10) for 0 ^ < oo, there exist some values 

uch that, if 

' “ (31) 


C,c,(f > 0, only depending on a, such that, if 

M ^ Cmax(e“^'H(/C, 


with %{JC,'q) the Kolmogorov g-entropy of IC and the local set JCrj := (/C — /C) n for rj > 0, 
then for $ 1), a dithering ^ {[0,5]), and the associated mapping A defined in 

(Q, we have with probability exceeding 1 — that for all pairs x,y £ K, with x — y ^ 

\'^\x,y)-{ff^\\x-y\\ \ ^ {e+^)\\x-y\\ + c'(|t|+<5e). (32) 


Proof. The proof sketch of Prop. is as follows: (i) given x,y £ we first show that the 
r.v. 'D^{x,y) concentrates with high probability around {^/'^\\x — y\\ up to a systematic bias 
-^^)||£c — y\\ due to the sub-Gaussian nature of $ and controlled by the anti-sparse level of 
X — y, (a) we take a finite covering of /C by a y-net Qrf <Z IC (for rj > 0) and we extend the 
concentration of 'D^{x,y) to all vectors of Qrj by union bound; (Hi) we show that the softened 
pseudo-distance T>^ is sufficiently continuous in a neighborood of each pair of vectors in Q 
which then allows us to extend (32) to all pair of vectors in JC, as stated by Prop. 




(i) Concentration of V^{x,y): Given a fixed pair x,y ^ , we show that V^{x,y) con¬ 

centrates around its mean by bounding its sub-Gaussian norm as defined in ([^. From (28), 
V^{x, y) = M~^ Yhi ''^ith the M random variables Z) := df{(fj x+f^i, (pj y+£,i) for 1 ^ i ^ M. 
However, the sum of D independent sub-Gaussian random variables {Xi, ■ ■ ■ ,X£i} is approxi¬ 
mately invariant under rotation [52], which means that 


- EX, 




(33) 


Therefore, from (33), we find 


(34) 


As shown in the following lemma (proved in App. by using Lemma ||Z ^||,^2 can be 
upper bounded (and with it, the sub-Gaussian norm of ^{x, y)). 

Lemma 2. Let us take p r\j ■^s^,a(0>l) andi r\j U{[0, (5]). For a fixed t G M, the random variable 
Z^ := df{p^ X + f, y + f) is sub-Gaussian with 'if2-norm bounded by 



\\z^\(p 2 ~ N + \\x — y\\- 

(35) 

Moreover, 

\E Z* - Psgix - y)\\ < \t\, 

(36) 

with psgix — y) = E ((^,03 

-y)\ = -y\\ i/v? ~ AA^(0,1). 
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Consequently, from (34) and (35), X := ~ is itself sub-Gaussian with 

ll^lli /'2 ^ + 1^1 + II® “ y\\- Therefore, from the tail bound ([^, there exists a c > 0 such that 

for any e > 0 


> ^('^ +1^1 + II* “ ^ID] < 2exp(-ce2M). 

i 

Since KZj = KZl and EZ^ = E|((/3,® — y)| = /^sg(® — y) for all i G [M], ( |3^ provides 

^ \'^Hx,y) - Msg(x -y)l - |EZ^ -fisg(x-y)l 
^ |T>‘(a;, 2 /) - Msg(x - y)\ - c'\t\, 

for some constant c' > 0, and 


E[\V\x,y) - i^sgix - y)\ > c'|t| +e((5 + |t| + \\x - y||)] ^ 2exp ( - ce^M). (37) 


(a) Extension to a coveriny of fC: Given a radius r/ > 0 to be specified later, let Qr^ 
an T/-net of /C, i.e., a finite vector set such that for any x ^ tC there exists a xq € Qrt with 
||a; — ccqII ^ rj- In particular, any vectors x,y € X can then be written as 


x = xo + x , y = yo + y , 


(38) 


for some XQ,yQ G and x',y' G (/C — /C) G rjE^. We also assume that the size of Qrj is minimal 

so that, by definition, log|^^| = 77(/C,r/), with 77 the Kolmogorov rj-entropy of X. 

Since there are no more than \Gn\‘^ distinct pairs of vectors in given f G M, a standard 
union bound over (37) shows that there exist some constant C,c',cl' > 0 such that, if M ^ 
Ce-‘^'H{X,y) 


IF’[ Va;o, 2/0 G Gr,, |T'‘(®o, 2/o) “ 7^sg(®o - Z/o)| ^ c'|t| + e (<5 + |f| + ||a;o - 2/oll)] 

^ 1 — exp ( — ce^M) ^ 1 — 2 exp ( — c"e^M). (39) 


(Hi) Extension to by continuity of : We can extend the event characterized in (39) 

to all pairs of vectors in X by analyzing the continuity property of P* in a limited neighborhood 
around the considered vectors. We propose here to analyze this continuity with respect to 72- 
perturbations of those vectors, as compared to 7i-perturbations in |44] . As will be clearer later, 
this allows us to reach a better control over M with respect to e. 


Lemma 3 (Gontinuity with respect to 72-perturbations). Let xo,yQ,x',y' G We assume 
that W^x' II ^ r]^/M, II^2/'II ^ V'/M for some T] > 0. Then for every t G M and P ^ 1 one has 


P*+’?^^(®o,yo)-4(|. + ;^) ^ V\xo + x',yo + y') ^ ^^,) + 4(T + ^). (40) 


The proof is given in App.|^ Interestingly, the following proposition proved in App.[E] shows 
that ||^®^|| and ||^2/^|| can indeed be bounded uniformly for all x',y' G Xrj := {X — X) H r/E^. 

Lemma 4 (Diameter stability under random projections). LetTZ C be bounded, i.e., ||77|| := 
suPijgT^ ||ri|| < oo and assume TZ 3 0. Then, for some c > 0, if 


M > 
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for $ ~ 1) o,nd with probability at least 1 — exp(—ca ‘^M), we have for all x £ TZ 

(41) 

i.e., ||$7^|| ^^/M\\^Z\\. 

For the sake of simplicity, we consider below the sub-Gaussian parameter a. as fixed and 
integrate it in explicit or hidden constants, as in the notations or Noting that 

\\1Cj^\\ ^ rj and using a union bound over (|3^ and (41), we get that if 

M > max{e~^'H{IC,r]),r]~'^w{JCr^f), 

with probability higher than 1 — 4exp(—c'e^M), for all XQ,yQ G Q^i and all x',y' G /C,y, 

- hsg{xo - yo)\ ^ c\t - yVP\ +e{6 + \t- yVP\ + ||a;o - VoW), (42) 

\j)t+vVP(^xo,yo) - ysg{xo - yo)\ ^ c\t + r]VP\ + e((5 + |f + r]VP\ + ||a;o - l/oll)> (43) 

||$a ;'||2 ^ r|^/M, ||$y '||2 ^ yy/M, (44) 

for some C, c, c' > 0 depending only on a. 

Therefore, for anyx,y G /C, using sequentially (38), (44), the upper bound given in Lemma|^ 
and (42) provides 

V\x,y) < p^-' 7 ^/P(a,o,yo) + 4(;4 + ^) 

^ (c + e)|t — r]y/P\ + y.sg{xo — yo) + e||®o “ l/oll + + 4(p + -^p)- 

However, given (f ~ ■^sg,a(0)l)) using Jensen’s inequality, the reverse triangular inequality 
and ([^, we find 

\hsg{xo - yo) - hsgix - y)\ = \E\{(p,xo - yQ)\-E\{(p,x - y)\\ 

^ IE|(vJ,a:')| +E\{(p,y')\ ^ 2y. 

Moreover, |||a;o — yo\\ — \\x — y\\\ ^ 2r/, so that, 

T>\x, y) - y,sg{x - y) e\\x - y\\ + (c + e)(|t| + yVP) + 2y + 2ey + e6 + 4(;4 + 

If a; — y G Eko, then (14) induces |ltsg(® — y) — — y||| ^ «^sg||® — y\\/y/l^ and assum¬ 

ing e < 1 , there exists a c > 0 such that 

V\x,y) - Qf)/^\\x - y\\ ^ (e-b ^)ll® - y\\ + c{\t\ + yy/P + y + e6 + js + 7 ^). 




(45) 


Taking P = e ^ ^ 1 and y = < be, which gives yy/P = be and y/y/P = be^ ^ be, we 

find for another c > 0 

V\x,y) - {^f/^\\x-y\\ ^ (e + ^)||a;-y|| +c(|t| + Je). 

Similarly, using ( |38| ), (44), the lower bound given in Lemma and ( |43[ ), we obtain 
P*(a;,y) - (|)V=||a;-y|| ^ -{e + ^)\\x-y\\ - c(|t|+Je). 

Finally, we have thus shown that there exist some c, c' > 0 such that for 

M > max{e~‘^n{fC,V^),-^w{fC^^)‘^), (46) 

with probability at least 1 — 4exp(—cT^M) the bound 

\V^(x,y)-{frf/^\\x-y\\\ ^ {e+^)\\x-y\\ + c(|t|-b <5e) 
holds for all a;, y G /C n Ekq, which finishes the proof of Prop. □ 


19 













As mentioned earlier, Prop.[^is thus obtained by simplifying the requirement (|31[) appearing 


in Prop. First, for a general bounded set /C, since the Sudakov ineqna 


ity in (F[l4|) provides 


^ , noticing that fCr, <Z {fC — 1C) and that (pj^ and (F4) provide w{IC^ 


w{)C — IC) ^ 2w{IC), we deduce that (46) holds if 








as imposed in (16). 


Second, in the case of a quantized embedding of the structured sets defined in the Introduc¬ 
tion (see Def. [^, we can even reach a much weaker condition on M. Indeed, for snch a set tC 
with d = ||/C||, from (|3b[) and the definition of w, we have for any t/ > 0 


w{Kr^)‘^ = w{jJC — K) f^rjB^)‘^ = d?w[{d ^IC — d ^/C) n (d ^ 


so that, from (3a), the right-hand side of (pTl) can be bounded as 


max(e V^ w{1Cr^f) ^ max(e log(l-F ^(/C)^) 


52,3. 


^u;(/C)^log(l-k-j&). 


52^3- 


This explains the simpler requirement dlT] ) needed for structured sets in Prop. 

Example: Let us conclude this section by deducing an upper bound on w‘^{IC) for the set 
/C := Sj n dB'^ (with d = ||/C|| > 0) of bounded iL-sparse vectors in an orthonormal basis 
T' G of We first notice that since rc(S^ n dB^) = w{'Lk H dB-^) by invariance over 

the orthogonal group On (see (F[T2|) in Table and from (fUZI), 

w{lC/\\lC\\)^ <K\ogN/K. 


Moreover, the Kolmogorov entropy is also invariant under On, Le., 'H{T,KC^dM^^rf) = H(S^n 
dB'^,T/) and it is known that (see, e.g., [16]) 

n{^K n dB^,r?) < log((^)(l + ^ A:iog(^(l + f)) < iLlog(f) log(l + ^), 

by using Stirling’s bound. This shows that F^(/C, r/) ^ t()(/C)^ log(l-|-^) with t()(/C)^ < KlogN/K. 
Additionally, since is invariant under dilation, d~^JC — d~^JC C C and 

w{{d-^lC - d-^JC) n eB^)2 ^ u>(E^^ n eB^)^ = e^w{T.'^K 

< e^2K\og{N/2K) < e^K\og{N/K), 


showing again, by matching with (3b), that we have w{lC)‘^ < K\og{N/K). 

This confirms that w{K,)'^ has the same upper bound than t(;(/C/||/C||)^. Therefore, for the 
structured set K, of bounded AT-sparse vectors, (46) (and therefore (fTT])) is then satisfied if 


M> ^Klog(f)log(l + 




7 Proof of Proposition 


Using the context defined in Prop. and for M satisfying (19), we are going to show the 
contraposition of (21), he., that with probability at least 1 — for some c > 0 and 
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for all a;, y G /C with x 
equivalently that 


y G '^Koi having \\x — y\\ > e involves Q{^x + / Q{^y + $.), or 


x-y\\>e => V{x,y)^jj, 


(47) 


from the definition of P in (24). 

The proof sketch is a follows. First, for some rj > 0, we create a finite r/-covering of the set 
^ C /C X /C of vector pairs whose difference belongs to '^Kq- Second, in order to show ( fTT] ), 
we leverage the continuity of the pseudo-distance P* under £ 2 -perturbations (Lemma [^, as 
it happens that all points of 1C are obtained by t' 2 -perturbations of the //-covering and that, 
moreover, those perturbations are stable under projections by $ (Lemma|^. Finally, we adjust 
T] and some additional parameters to show that, with high probability, the softened distance 
'D^{xQ,yQ), for some t depending on r/, is large enough over all pairs {xQ,yQ) of the covering 
compatible with ||a; — y|| ^ e, hence inducing (47). 


Let us define the set ^ = {(a:, y) G /C x /C : ® — y G C ICx JC. We introduce a minimal 

y-net Qn 1C oi K, with 0 < rj < e/2 to be specihed later, such that for all {x,y) G 1C, there 
exists a (£Co,yo) £ Or] with 

\\{x,y) - (®o,yo)|| ^ V, 

which also involves \\x — £Co|| ^ y and ||y — yg|| ^ y. 

The size of this minimal y-net is bounded as log |^r?| ^ rij^/2). Indeed, by the semi¬ 

additivity of the Kolmogorov entropy [35l Theorem 2], 1C C 1C x 1C involves that T-L{lC,p) ^ 
77(/C X lC,p) for any y > 0. Since a y-net of /C x /C can be obtained by the product Qp/ x Qp, 
with p' = y/\/2 and Qp/ a y'-net covering of 1C, we obtain %{lC,p) ^ 2'H{]C, p/y/2). 

As for the proof of Prop, [^in Sec. by construction, all {x, y) £ 1C can also be written as 


{x,y) = {xo,yo) + {x',y'), 

with (a:o,yo) £ Qr], {x',y') £ {1C — 1C) H yB^^. Notice that we have also x',y' G ICp := 
(/C - /C) nyB^, since x,xo,y,yQ G 1C and max(||®'||, ||y'||) ^ ||(®',y')|| ^ p- 

As stated by Lemma the diameter of the local set JCp is stable with respect to random 
projections. Since \\lCr]\\ ^ y, there exist indeed two values C, c> 0, only depending on the 
sub-Gaussian norm a, such that if 


M ^ Cp-^w{lCr,)^ 

and $ 1). we have with probability at least 1 — 2exp(—cM), 


$/C^|| := sup ||^it|| ^ ^ py/M. 

uelCr, 


(48) 


(49) 


Therefore, ^ py/M and ||^y^|| ^ py/M under the same conditions. 

Moreover, if the previous event occurs, then, Lemma for t = 0 shows that for any P ^ 1, 

V{x, y) = V^{xq + x', yg y') ^ V'^^{xo, yg) - 4(| (50) 


Consequently, for reaching 'D{x,y) ^ 5/M as expressed in (47), since \\x — y|| ^ e involves 
11*0 ~ VoW ^ e ~ 2y, the proof can be deduced if we can guarantee that, for all {u, v) G Op with 
||it — ?;|| ^ e — 2y, the probability that 'D^^{u,v) ^ 4(|i -|- ^) -|- ^ tends (exponentially) to 
one with M. 


Let us upper bound the corresponding probability of failure. We can first observe the 
following result on a fixed pair of vectors. This one is proved in App. 
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Lemma 5. Let u, v be in with u — v £ fof some Kq > 0 and ||u — ^ eo for eg > 0. 

For 5 > 0, t ^ 0, r G ^ ~ 1); $, ~ ^(^{[0,6]) and the pseudo-distance 

defined in (25), we have 


(51) 


¥[V\u,v) ^ ^r] ^ exp(-^^^§^), 
with p := F[d^{(f~^u + + ^) 7 ^ O], ~ ■^sg,a(0)l) f. ~ ^([0,(5]). Moreover, if 

y/Kg ^ 16Ksg, 

(52) 




5+eo ' 


From the discrete nature of 2?*, the previous lemma (with t set to py/P) shows that for a 
fixed pair of vectors {u,v) ^(r + 1 ) holds with probability at least 1 — exp(—(Mp — 

r)^/(2Mp)). Moreover, if 


m’’ ^ ^(p + Jp)’ 


(53) 


we have 


P^^^(^,u)^^(r+ 1 ) ^ pr,^/P(^,,^)^4(| + ^) + A. 

Therefore, setting r = |'Mp/2] ^ Mp/2, ( [^ gives 

F[V'^^{u,v) 4(| + ^) + ^] ^ 1 -exp(- ^^^p^" ) > 1 - 2exp(-^), 

if, from ( [SS] ), 

P^Up + ^)- (54) 

Thus, we have to adjust P and p in order to satisfy ( [Sd] ). Noting that e — 2p ^ 11'*^ ~ '^11 ^ 2 if 
/C C B^, i.e., that we can set cq = 2 in Lemma this adjustment can be done from (52) by 
imposing B C in 


^ by^ ^■“ 16 (l+ 2 )(^ ‘^h) Ti2 ^ ^-“^(p + ^Jp)- 


A solution is to set, for some c ^ 1 and d > 0 to be specified later, P = c 


- „2 2+S 


(55) 


> 1 and 


p = d 


e3/2 

y/2+S 


C de. Then 


e - 27? ^ (1 - 2d)e, py/P = cde, p = = i 


__< d 2 

5VP “ c (5(2+5) ^ c 5(2+5) 


SO that 


D l — 2d—32cd ^ ^ 8 ^^ \ ^^2\ ^ 

16((5+2) ^ ^ c'^2+S) 

Fixing d = 1(32)“^^^ < |(32)“^ and c = 32, a few estimations show finally that 


.-1 D ^ l-(32)-^-i > 1 




e-^C ^ 


((1 + 


i)< 


16(5+2) ^ 33(5+2)’ (32)^(2+5) M w (64)(5+2)(' ^ 64(5+2)’ 

proving that for our choice of parameters, i.e., for P = (32)^ ^ 1 and r? = ^{2,2)~‘^5{i^)^/‘^, 

(54) can be satisfied since B C. Moreover, for this choice of parameters, (54) provides 

P ^ 33(2%5) ■ 

We are now ready to complete the proof. Using the previous developments, defining := 
{{u,v) G Grj : ||w — u|| ^ e — 2r?} C Grj with r? ~ (5e^/^(2 + <5)“^/^ fixed as above and log|^(^| ^ 
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log \Gri\ ^ 2^(/C, r//-v/2) as explained before, by a simple union bound there exist some constants 
C, c,c'>0 such that if 

then the event 

+ + yu,veg'^, (56) 

holds with probability at least 

1-2ex.p{2n{JC,^) - ^) ^ 1 - 2exp(2^(/C, ^) - 33 ^ 11 ^) ^ 1 - 2exp(-c'^). 


Remembering that for having (50) the diameter of ICri must remain small under random 
projections by $ (as stated in (49)), so by imposing (48), we find again by union bound that 
for some other constants C, c, c' > 0, if 

M^Cmax(^(|g)^u;(/C^,(^^3/2)^^H(/C,c5(^)3/2)^, (57) 

then, with probability at least 1 — 4exp(—c'Me/(2 + 5)), for all x,y € JC with x — y € 2xo s-iicl 
II® — y II ^ Cj (50) combined with (56) provides 


'^{x,y) ^ jj, 


as requested at the beginning. 


We conclude the proof by simplifying the_general condition (57). First, for a general bounded 

'iS 


14) and Sec. 


provide %{JC,'q) < and w{}Cri) ^ 2w{JC), so 


set /C, Sudakov inequality (P. 
that (57) holds if 

for another constant C > 0. 

Second, if the set K. is structured, then, from (|^ and the same simplifications used for 
Prop. to reach Prop. the right-hand side of ( [^ can be bounded by 

m.ayi{{6s)~‘^w{'ICcSsf ■, 'H[1 C,c6s)) ^ max (c^u)(/C)^, ^u)(/C)^ log(l |^)) 

< 2^^(/C)2iog(l+(?±g^), 

with s := e^/^/(2 -|- 5)^^^, which explains the requirement 
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A On the absolute expectation of a difference of dithered floors 


This short appendix proves the equality 


E|[x + ^J -[y + CW = \x-y\, Vx,yGM, C~^([0,1]). 

Denoting a = [xj G Z, 6 = [yj £ Z, x' = x — a G [0,1) and y' = y — b £ [0,1), since 
[A — nj = [AJ — n for any A G M and n G Z, we can always write 

+ - Ly + ^JI =E|a-^ + ^l, 

with X = Yx' + ^\ — Yy' + . Without loss of generality, we can assume that the r.v. X is 

positive, i.e., x' ^ y' (just flip the role of x and y if this is not the case). Moreover, since 

x', y' G [0,1), X G {0,1} and 

P(X = 0) = P(x' + ^ < 1, y + ^ < 1) + P(x' + I ^ 1, y + e ^ 1) 

= P(x' + C < 1) + P(y' + ^ ^ 1) = 1 - x + y'. 

Therefore, 


E|a — 6 + X| = (|a — 6| — \a — b+ 1|) P(Ar = 0) + |a — 5 + l| 

= |o — 6| — {x' — y){\a — 6| — [a — 6 + 1|). (58) 

If x' = y', then E|o —6+X| = |o —6| = |x —y|. Let us consider now the case x' > y'. If x — y ^ 0, 
then a — b ^ y' — x' > —1 since x' < 1, i.e., a — b ^ 0 since a — b £ Z. Consequently, (58) 
provides E|a — b + X\ = a — b + x' — y' = x — y. When x — y < 0, b — a > x' — y' > 0, i.e., 
a — b^a — b + 1^0, and we get E|o — 6 + X| = 5 — a — (x' — y') = x — y. In summary, 
E|a — 6 + X| = |x — y| in all cases, which proves the result. 


B Proof of Lemma [I] 

We start by observing that 

I \d\a, b) - d^{a, 6)| ^ |l[-^‘(a - k5,b - kS)] - I[.F*(o -k5,b- k6)] \ 

^Ekezm'’^ia-k5,b-k5)] 

with 

n*’^{a,b) := X\a,b) A T^{a,b) := {X\a,b) U T^{a,b)) \ {X\a,b) D X%a,b)). 

For t ^ s, F^{a,b) C F^{a,b) and 'H^’^{a,b) = F^{a,b) \F^{a,b), while for t < s, 'H^’^{a,b) = 
F^{a, b)\F^{a, b). Moreover, a careful piecewise analysis made on the different sign combinations 
for s and t show that 'h}{a,b) C {|a| G [r_,r+]} U {|6| G [r_,r+]} with r+ := max(|s|, |t|) and 
r_ equals to min(|s|, |t|) if ts ^ 0 and 0 otherwise. Consequently, writing r = — r_ ^ — ^l, 

\d\a,b) - d^{a,b)\ ^ 5 ^^^^lY{\a - k6\ £ [r-,r+]} U {\b - k6\ £ Yr-,r+]}] 

^ 2<5(f+ 2) =4(|t-s|+(5). 

Moreover, if s = 0, since then r_ = 0 and r+ = r = |t|, 

Efcez I[{l« - kS\ ^ |t|} U {[6 - k6\ ^ |t|}] ^ 25(f + i) = 4|t| + 25, 

and we find 

|d*(a, 6) — [a — 6| I ^ |d*(a, b) — d{a, b) \ + |d(a, 6) — [a — 6| | 

= |d*(a,6) - d^{a,b)\ + ||Q(a) - Q{b)\ - [a - 6|| 

^ (4|t| + 25) + 25 =4(|t| + 5). 
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C Proof of Lemma 


Let us define Z := \i^'{x — y)\ = \a — h\ with the two r.v.’s a = ^p'x + ^ and h = Lp' x + 
From (13), EZ = EZ^. Moreover, from the approximate rotational invariance property (33), Z 

\x — y\\, and using Lemma 1 and the bound 


is sub-Gaussian with ||Z ||^2 = \\ip^{x — y)||^. 


, < 
A’2 i~ 


11^2 




we find 


■^^llv '2 ^ ~ ^\\lp2 + \\^\\tp2 

< ||d*(a, 6 ) - |a - 6|||^2 + \\x - y 
<S + \t\ + \\x-y\\, 


which demonstrates the sub-Gaussianity of Z*. 

For the expectation, writing a = a' + ^ and b = b' + ^ with a' = x and b' = y, by 
Jensen’s inequality and the law of total expectation, we find 

|EZ* - EZ^\ ^ E|Z^ - Z°| = E<^E^|/(o' + i,b' + ^ - d{d + C, &' + 01- 

However, reusing some elements of the proof of Lemma and considering p fixed, 

E^\d\d + ^, 6 ' + 0 - d{a' + e, 6 ' + e)l| 

^ <5 EfcezE^I[[{|a' + C-k 6 \^ |t|} u {W + ^ - k 6 \ ^ |t|}] 

^ + ^ |t|}] +5EfcezEsl[{|b' + ?-feJ| ^ |t|}]. 

Moreover, since ^ U{[0,5]), 

<5 Efcez®^€l[{|a' + ?-^'^l ^ l^l}] =T.k(^z!o^[{\d + s-k5\ ^ \t\}] ds 

= /rI[{I«' + 'S| ^ \t\}]ds = 2\t\, 

which provides also 5 ^^ggE^l[{|6' + ^ — k6\ ^ |t|}] = 2|t|. Gonsequently, since these two 
quantities do not depend on p, we find |EZ* — EZ^\ < |t|. Finally, if ~ AA^(0,1), Z° ~ 
M{0, 11® — y|p), and E|Z°| = ||® — y\\. 


D Proof of Lemma [3] 


We adapt the proof of Lemma 5.5 in [S] to both £ 2 -perturbations (instead of ii ones) of xq and 
^Q, and to the context of uniform dithered quantization instead of 1-bit (sign) quantization. By 
assumption, we have ||^®^|| ^ yy/M and ||^y^|| ^ yy/M. Therefore, the set 


T := {i G [M] : |($®')il ^ y'/P,\{^y')i\ ^ y'/P] 

is such that |T'’| ^ 2M/P as 2y‘^M ^ ^ ||($®')r|p + ||($y')T|p + \T^\Py^ ^ 

\T^\Py‘^. Considering the definition of in (26), we have, for alH G T and any A G M, 

J^^+r^^/T(3.o,yo,A) C EI{xq + x', yQ + y',X) C {xo,yQ,\), 


with El{xo, yo, A) := E\pJ ®o + Ci - A, pjyo + Ci- A). 

Denoting a* = max(|<^Aa;'|^ \ pjy'\), we find 

V^^^^ixo, r/o) = ij Efii Ek^z Vo, kS)] 

^ w Ei&T Ekez + x', y^ + y' , k5)] + Ei^T- Ek&z ^ ^ ^ y, 

^ M Ei&T Ek& + x', r/o + y'. k5)] + -d Ek&z Vo + y\ k5)] 

+ ii EieT= Efcez (®o + x', y^ + y', k6)] - I[Jd {xo + x',yQ + y', k6)] \. 


25 





Using (29) to bound the last sum of the last expression and since, by definition of T, a* ^ r]\/P 


for i G T'^, we find 

Vo) ^ V\xo + x', + y') + jj + a* - y^P) 

^ V\xo + a;', I/O + I/') + ^ Eier=(«* “ vVP) 

^ V\xo + x', yo + y') + f + ^ T,ieT- Oi - ^y^- 

However, 

igEiCT."^ < iJ7(II(#x')t.||i + ll(#y')T.||i) « 43 (||(^x')t.|| + ll(#y')T.||) < 
and since /(t) = 2t — t^^/P ^ ^j^fP for all t G M, we find 

yo) ^ yo + 2/0 + 7 + 4y (2 

^ P^{xq + s',yo + yO + 7 


which provides the lower bound of (40). 

For the upper bound, 

> W Ei6T T.k0L (a;o + ®', yo + y'. *:^)1 + ir Ei6T« Eiez I|Ay’’'^+“*(a;o + *', yo + y', *:«)] 

^ i:>‘(a;o + a;',yo + y') 

- id ^ Efeez + ®0 yo + y', k6)] - i[j-j-vVP+^i + x', yo + y', /c<5)] |, 

and, as above, the last sum can be upper-bounded by ^ -|- using (29). 

E Proof of Lemma |4] 

We use here a similar proposition of Mendelsorj^ et al. in [39] for subsets of ^ that we lift 
to subsets of thank to some tools developed in jUj for other purposes. 

We fix t = ||7^||/-v/6 and form the set P' := {ri/||u|| \ u £P®t} with 7^ © t := {(f) : a; G 
P} C As P' C S^, we know from |39l Theorem 2.1] that for 0 < e < 1, 


M > ^w{P') 


/\2 


and 


“sg,o 


( 0 , 1 ), 

lP[sup,^/ 67 e/ - 1| ^ e] ^ exp(-cy 

However, for g Af^{0, 1 ) and 7 J\f{0, 1), as observed similarly in 

w{P') = E sup(||a;|p + f)~^^^\{g,x) +t-f\ ^ j (E sup |(y,a;)| 


x&'R, 

^ ^\ w { P ) + (#)V= ^ 41^ 


x&'R, 


117^11 


®Where a totally equivalent sub-Gaussian norm is used, i.e., := inf{s : Eexp(X^/s^) 7 2} with 


IWII 


(Mend.) 

•02 


~ ||x||v,2 M- 
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since, for all x € TZ, w{TZ) ^ ||a;||, i.e., w{TZ) ^ ll’^ll- Therefore, fixing e = 1/2, if 

M > a^w{7iY/WICW"^, with probability at least 1 — , we have, for all x £ TZ, 











tVM' 


|$a;| 


m ' m \) 




1 

t\/M 


|$a;| 



where a;' = ||(f)|| ^(f)G7^',0G is the last column of and using the fact that (®) £ TZ' 
since 0 £ TZ. Therefore, replacing t by its value, we find with the same probability. 


1 

%/M 




for all X £ TZ, i.e., ^ \/M||7^||. 


F Proof of Lemma 


From the relation T)^{u, v) = ^ established in Sec. [^between and d* G 

dN defined in (27), and associated to the vectorial mapping u £ —)• ^ whose 


components are independent, we reach the bound (51) with the cdf of a binomial distribution: 
since 


F[fV\u,v) ^ r] ^ P[|{j G [M] : (u), / 0}| ^ r] 

= n=o{^,)pHi-pr-^ 

Chernoff’s inequality can upper bound this binomial cdf with 

F[fV\u,v)^r] ^exp(-(^^5rf). 


(59) 


Let us now lower bound p. Defining w = u — v £ T/kq and w = m/||m||, the action of 
dithering ^ ~ ^([0, <5]) allows us to compute easily that, 

p = E^pF^[d!^{ip^u + ip'^v + ^ O] = Emin (l, - 2t)+). 

In order to avoid any further singularity when d —)> 0, we can benefit from the fact that p ^ 1 
and work with this slightly looser bound: 

p ^ Emin (l, (eo + - 2t)+). 

Moreover, with a = ||it — u||/((5 + eo), 

p ^ Emin(l,a|(^’^m| - ^ E min(l, alv^’^ml) - 

so that 

p ^ a\g\) - ^ - A, (60) 

where g AA(0,1) and ^4 := |Emin(l, a\(p^w\) — Emin(l, a|( 7 |)|. 

We can upper bound A from our assumptions on the sub-Gaussian vector (p A4^.a(0,l): 

A = I E(min(l, a\g>'^w\) ^ u) — E(min(l, a\g\) ^ u)du\ 

= I fg E(q; |¥5'''m| ^ u) — E(q;|5| ^ u) du | 

^ a |E(|y3’^m| ^ u) - E(| 5 (| ^ u)\du 

\w 




(5+eo 


< ry 

1 “ ^ VKo ’ 
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where the last inequalities rely on assumption 0 (setting u = w) and on the fact that w G '^Kq- 
Moreover, for lower-bounding Emin(l,aj^rl) in (60), we observe that min(l,ax) = ax — 
a{x — l/a)+ for x G M. Therefore, defining F{x) :=^ax^ — ^a{x — !/«)+ = min(l, au)du 
and integrating by parts, we find 

Emin(l,a|5|) =E(|5 |F(|<7|)) ^ (#/^F((#)V^) 

where in the last inequality we used Jensen’s inequality and the convexity of x G M+ i--)> xF(x). 
It is easy to see that 2F{x) ^ ax^/(1 -|- ax) so that 


Emin(l,a| 5 r|) ^ 




> 4 


1 a 


4 l+a * 


Finally, 


P 




1 a 


2t 




1 1 


4 l+a 5+eo V~Ko ^ 4(5-|-2eo'^^ <5+eo '/Kq ^ (5+eo8 VFq' 

the last expression providing (52) if V)Ko ^ 16Ksg- 


2t 


_ ^ 1 /I _ )\\^. _ „.|| _ 


2t 

(5+eo ■ 


G A lower bound on the approximation error of the Mean Ab¬ 
solute Difference of a binomial random variable 


This small section establishes a lower bound on the approximation error of the MAD := 
E|/3„ — E/3„| of a binomial random variable /3n ~ Bin(n, 1/2) by a fraction of its standard 
deviation an ■= (E|/?„ — E/3„p)^/^ = \/nl2. Curiously enough, we were unable to find a similar 
result in the literature while an upper bound on this approximation error in 0(l/n) when n 
increases is well known (see e.^., mm)- Specifically, we want to prove that 

\Mn-iwf^^an\ ^ C'cr„n“\ 


for some absolute constant C > 0 and all n ^ 1. 

We start from the Stirling’s approximation of the factorial with an error bound due to 
R. W. Gosper [53] and redeveloped more clearly in m (see also m for a similar bound): 


n"’e "■-Y/27r(rr+^ ^ n! ^ n"’e ^^2TT{n + g). 
However, De Moivre gave the following exact formula for M 2 n [IS], 

Man := n 2-“^^ = n 2-“^^ ^ 


(61) 


TW- 


(n!) 

Therefore, applying (61) on this formula and using + x ^ 1 -|- ix for x ^ —1, we find for 
n > 1 


Man 2-2- 


(2n)2"e-2"^27r(2n+|) 
2 n g — 2 n 2 TP ( 72-)-1) 


Wr(n-l-i) 


< (!■)'/= 


<^2n 




6n-|-l 


(#■)'^"0■an (1 


12n-|-2 




or equivalently 

(|■)'/"c^2n - Man ^ Ca2n (2n)-\ 
with C = 1/7, which provides the result. 
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